Frequent word section extraction in a presentation speech by an effective dynamic programming algorithm.
نویسندگان
چکیده
Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.
منابع مشابه
Presentation and Solving Non-Linear Quad-Level Programming Problem Utilizing a Heuristic Approach Based on Taylor Theorem
The multi-level programming problems are attractive for many researchers because of their application in several areas such as economic, traffic, finance, management, transportation, information technology, engineering and so on. It has been proven that even the general bi-level programming problem is an NP-hard problem, so the multi-level problems are practical and complicated problems therefo...
متن کاملAn Algorithm for Voiced / Unvoiced Decision and Pitch Estimation in Speech Feature Extraction
An algorithm which combines voice / unvoiced decision and pitch estimation is proposed in an enhanced process of MFCC feature extraction. The residual energy of LPC analysis and normalized autocorrelation are calculated and the static and dynamic thresholds are set for the voiced, unvoiced and transitional decision. Thus speech is divided into three classes that are voiced, unvoiced and transit...
متن کاملAn Efficient Unified Extraction Algorithm for Bilingual Data
The paper presents a unified algorithm for aligning sentences with their translations in bilingual data. The sentence alignment problem is handled as a large-scale pattern recognition problem similar to the task of finding the word sequence that corresponds to an acoustic input signal in isolated word automatic speech recognition (ASR). The algorithm gains efficiency from related work on dynami...
متن کاملOpen pit limit optimization using dijkstra’s algorithm
In open-pit mine planning, the design of the most profitable ultimate pit limit is a prerequisite to developing a feasible mining sequence. Currently, the design of an ultimate pit is achieved through a computer program in most mining companies. The extraction of minerals in open mining methods needs a lot of capital investment, which may take several decades. Before the extraction, the p...
متن کاملSystem for visual and experimental introduction to basic speech recognition algorithms
This contribution describes an educational system for visual presentation of basic speech recognition algorithms. It is aimed at introducing topics, like speech feature extraction, dynamic programming (DP) or the application of the hidden Markov model (HMM) theory. The system allows students to set up various types of recognition experiments with either pre-recorded or directly spoken words and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 116 2 شماره
صفحات -
تاریخ انتشار 2004